Goto

Collaborating Authors

 multistage accelerated stochastic gradient method


Reviews: A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Neural Information Processing Systems

Originality: This paper provides a clear and deep analysis of a multi-stage accelerated SGD algorithm. The results show that the expected function value gap is bounded by an exponential decay term plus a sublinear decay term related to noise. They recover the deterministic case in the single stage and zero noise special case, while reaching the lower bound O(\sigma 2/n) in the noise term. The paper contains sufficient novel results and is competitive comparing with related work. In particular, the main results reveal how to choose the right time to switch from constant stepsize to decaying stepsize, a crucial choice for the overall performance of stochastic algorithms.


Reviews: A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Neural Information Processing Systems

This paper designs a multistage SGD algorithm that does not need to know noise and optimality gap at initialization and yet obtain optimal convergence rates. This is a well written paper with good results.


A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Neural Information Processing Systems

We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic version of Nesterov's method with a specific restart and parameters selected to achieve the fastest reduction in the bias-variance terms in the convergence rate bounds.


A Universally Optimal Multistage Accelerated Stochastic Gradient Method

Neural Information Processing Systems

We study the problem of minimizing a strongly convex, smooth function when we have noisy estimates of its gradient. We propose a novel multistage accelerated algorithm that is universally optimal in the sense that it achieves the optimal rate both in the deterministic and stochastic case and operates without knowledge of noise characteristics. The algorithm consists of stages that use a stochastic version of Nesterov's method with a specific restart and parameters selected to achieve the fastest reduction in the bias-variance terms in the convergence rate bounds. Papers published at the Neural Information Processing Systems Conference.